Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Molecular phylogenies are a cornerstone of modern comparative biology and are commonly employed to investigate a range of biological phenomena, such as diversification rates, patterns in trait evolution, biogeography, and community assembly. Recent work has demonstrated that significant biases may be introduced into downstream phylogenetic analyses from processing genomic data; however, it remains unclear whether there are interactions among bioinformatic parameters or biases introduced through the choice of reference genome for sequence alignment and variant calling. We address these knowledge gaps by employing a combination of simulated and empirical data sets to investigate the extent to which the choice of reference genome in upstream bioinformatic processing of genomic data influences phylogenetic inference, as well as the way that reference genome choice interacts with bioinformatic filtering choices and phylogenetic inference method. We demonstrate that more stringent minor allele filters bias inferred trees away from the true species tree topology, and that these biased trees tend to be more imbalanced and have a higher center of gravity than the true trees. We find the greatest topological accuracy when filtering sites for minor allele count (MAC) >3–4 in our 51-taxa data sets, while tree center of gravity was closest to the true value when filtering for sites with MAC >1–2. In contrast, filtering for missing data increased accuracy in the inferred topologies; however, this effect was small in comparison to the effect of minor allele filters and may be undesirable due to a subsequent mutation spectrum distortion. The bias introduced by these filters differs based on the reference genome used in short read alignment, providing further support that choosing a reference genome for alignment is an important bioinformatic decision with implications for downstream analyses. These results demonstrate that attributes of the study system and dataset (and their interaction) add important nuance for how best to assemble and filter short-read genomic data for phylogenetic inference.more » « less
-
Abstract The opposing forces of gene flow and isolation are two major processes shaping genetic diversity. Understanding how these vary across space and time is necessary to identify the environmental features that promote diversification. The detection of considerable geographic structure in taxa from the arid Nearctic has prompted research into the drivers of isolation in the region. Several geographic features have been proposed as barriers to gene flow, including the Colorado River, Western Continental Divide (WCD), and a hypothetical Mid-Peninsular Seaway in Baja California. However, recent studies suggest that the role of barriers in genetic differentiation may have been overestimated when compared to other mechanisms of divergence. In this study, we infer historical and spatial patterns of connectivity and isolation in Desert Spiny Lizards (Sceloporus magister) and Baja Spiny Lizards (Sceloporus zosteromus), which together form a species complex composed of parapatric lineages with wide distributions in arid western North America. Our analyses incorporate mitochondrial sequences, genomic-scale data, and past and present climatic data to evaluate the nature and strength of barriers to gene flow in the region. Our approach relies on estimates of migration under the multispecies coalescent to understand the history of lineage divergence in the face of gene flow. Results show that the S. magister complex is geographically structured, but we also detect instances of gene flow. The WCD is a strong barrier to gene flow, while the Colorado River is more permeable. Analyses yield conflicting results for the catalyst of differentiation of peninsular lineages in S. zosteromus. Our study shows how large-scale genomic data for thoroughly sampled species can shed new light on biogeography. Furthermore, our approach highlights the need for the combined analysis of multiple sources of evidence to adequately characterize the drivers of divergence.more » « less
An official website of the United States government
